---
title: "README: Presidential Venezuelan Election Archive (PRESVEN-A)"
output: md_document
---

# Presidential Venezuelan Election Archive (PRESVEN-A)

## Description

The **Presidential Venezuelan Election Archive (PRESVEN-A)** is a curated dataset of **Venezuelan presidential election results from 2006 to 2024**, compiled at three levels: **polling station level**, **municipal level** and **party level**. It was developed to overcome the systematic restriction of electoral data in Venezuela under authoritarian rule. The dataset integrates official results from opposition-collected tallies, and census-based demographic data to enable subnational electoral analysis despite extreme censorship.

This dataset is intended for researchers, civil society actors, journalists, and international observers interested in **electoral behavior, democratic backsliding, and regime dynamics in autocratic settings**.

## Files Included

-   `ven_elec_2006_2024.rds`: Polling station-level dataset (77,737 observations)
-   `ven_mun_elec_2006_2024.rds`: Municipal-level dataset (1,668 observations)
-   `ven_party_long_2024.rds`: Party-level dataset (953,800 observations)
-   `party_codes_2024.csv`: Unique party codes
-   `PRESVENA_codebook.pdf`: Codebook describing all variables and classifications
-   `README.Rmd`: This file

## Coverage

-   **Temporal**: 2006--2024 (presidential elections)
-   **Geographic**: All 335 municipalities in Venezuela
-   **Level of aggregation**:
    -   Polling station (*mesa electoral*)
        -   Party/Candidate
    -   Municipality

## Data Sources

-   **Consejo Nacional Electoral (CNE)**: Official results (2006--2018), accessed via expert networks
-   **resultadosconvenezuela.com**: 2024 tallies collected by opposition civic networks
-   **Instituto Nacional de Estadística (INE)**: 2011 census data and population projections
-   **Wikidata**: Municipality surface area for population density calculations

## Key Variables

See `codebook.pdf` for a complete list. Key variables include:

-   `year`: Election year\
-   `of_c`, `op_c`, `otro_c`: Vote totals by political bloc\
-   `turnout`, `validos`, `nulos`, `abst_c`: Participation metrics\
-   `pob_2011`, `pob_proy_2020`: Demographic indicators\
-   `urb_level_oecd_bin`, `urb_level_oecd_cat`: Urban-rural classifications (binary and categorical)

## Methodological Notes

-   For **2006--2018**, only **one polling station per center** is included, as published by the CNE.
-   For **2024**, the dataset includes **multiple polling stations per center**, collected by opposition-led networks.
-   All data were cleaned and standardized using **R (2023)** and the **tidyverse** package.
-   Names and codes were harmonized by removing special characters and aligning administrative boundaries across sources.
-   Population density was calculated by combining INE population data with surface area from Wikidata.

| **Category**             | **Script**                            | **Description**                                                                                                                                                                                                                                                                                                  |
|-----------------|-----------------|--------------------------------------|
| **Data Handling**        | `ven_elections_handling.R`            | Processes raw electoral data, performing data cleaning, transformation, and preparation for analysis.                                                                                                                                                                                                            |
|                          | `ven_mun_data_handling.R`             | Cleans and structures municipality-level electoral data, including turnout rates, vote shares, and political alignment indicators.                                                                                                                                                                               |
|                          | `scrape_censo_2011.R`                 | Fetches and processes population data from the 2011 Venezuelan Census to integrate demographic insights into electoral analysis.                                                                                                                                                                                 |
|                          | `ven_mun_par_pob.R`                   | Aggregates population data at the municipal and parish levels to support demographic-based electoral analysis.                                                                                                                                                                                                   |
|                          | `pres_elect_2024_long_format.R`       | Transforms the 2024 presidential election dataset into long format at the polling station level, with each row representing a party--candidate vote count. The script includes party names, acronyms, bloc classification (Officialism, Opposition, Other), and geographic identifiers for merging and analysis. |
| **Descriptive Analysis** | `Descriptivos_bivariateplots.R`       | Generates bivariate descriptive plots to explore relationships between electoral variables, turnout rates, and demographic indicators.                                                                                                                                                                           |
|                          | `Descriptivos_PRESVEN.R`              | Performs descriptive analysis of Venezuelan presidential elections, covering vote shares, turnout rates, and regional patterns across different election cycles.                                                                                                                                                 |
|                          | `state_codes.R`                       | Maps Venezuelan state codes to their corresponding names and geographic identifiers for data merging and analysis.                                                                                                                                                                                               |
|                          | `Maps_Venezuela.R`                    | Creates geographical visualizations of electoral outcomes and political strongholds across Venezuelan municipalities and states.                                                                                                                                                                                 |
|                          | `plots_VENPRESA.R`                    | Produces visualizations for the paper's analysis of data blackouts and their potential impact on political behavior.                                                                                                                                                                                             |
| **Validation**           | `comparison_sample_realdata.R`        | Validates the reconstructed sample dataset by comparing it with official electoral results, focusing on discrepancies in vote shares and turnout.                                                                                                                                                                |
|                          | `robustness_intra_center_variation.R` | Evaluates the consistency of vote shares across polling stations within the same center, testing the robustness of intra-polling station variation.                                                                                                                                                              |

## Validation

We validated our dataset by comparing state-level vote shares to publicly available aggregates (e.g., media, Wikipedia). The results showed minimal discrepancies, supporting the internal consistency and reliability of the reconstructed data.

## Citation

If you use this dataset, please cite:

> Remiro, Luis and Jimenez, Maryhen. (2024). *Replication Data for: Votes and Voids: Reconstructing Electoral Data in Venezuela under Censorship* Presidential Venezuelan Election Archive (PRESVEN-A) Dataset. Harvard Dataverse. <https://doi.org/10.7910/DVN/NO1XJ2>

**Related article**:\
Remiro, Luis and Jimenez, Maryhen. *Votes and Voids: Reconstructing Electoral Data in Venezuela under Censorship.* Under review at *Perspectives on Politics*.

## Contributors

-   **Luis Remiro** (Universitat Pompeu Fabra)
    -   *Email*: [luis.remiro\@upf.edu](mailto:luis.remiro@upf.edu)
    -   *Website*: <https://luisremiro.netlify.app/>
-   **Maryhen Jimenez** (University of Oxford)
    -   *Email*: [maryhen.jimenezmorales\@area.ox.ac.uk](mailto:maryhen.jimenezmorales@area.ox.ac.uk)
    -   *Website*: <https://www.maryhenjimenez.com>

## Acknowledgments

We would like to express our gratitude to the many individuals in Venezuela who have contributed to the data collection process. Given the authoritarian context, we have chosen to omit identifying details to protect the safety of those involved. However, we especially acknowledge Héctor Briceño, among many others, for their invaluable support in sharing data and insights that have made this work possible.

## Contact Us

The database is continuously being improved. If you have any suggestions, comments, or issues, please contact the authors via email.
